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Abstract. Neurodegenerative diseases are frequently associated with 
structural changes in the brain. Magnetic Resonance Imaging (MRI) 
scans can show these variations and therefore be used as a supportive 
feature for a number of neurodegenerative diseases. The hippocampus 
has been known to be a biomarker for Alzheimer disease and other neu¬ 
rological and psychiatric diseases. However, it requires accurate, robust 
and reproducible delineation of hippocampal structures. Fully automatic 
methods are usually the voxel based approach, for each voxel a number 
of local features were calculated. In this paper we compared four differ¬ 
ent techniques for feature selection from a set of 315 features extracted 
for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; 
two wrapper methods, respectively, (ii) Sequential Forward Selection and 
(iii) Sequential Backward Elimination; and (iv) embedded method based 
on the Random Forest Classifier on a set of 10 Tl-weighted brain MRIs 
and tested on an independent set of 25 subjects. The resulting segmen¬ 
tations were compared with manual reference labelling. By using only 
23 feature for each voxel (sequential backward elimination) we obtained 
comparable state of-the-art performances with respect to the standard 
tool FreeSurfer [I]. 

1 Introduction 

The analysis of medical images such as Magnetic Resonance Images (MRIs) is 
useful to investigate and identify the structural alterations in the brain, fre¬ 
quently associated with dementia or neurodegenerative diseases. In this context, 
the hippocampal segmentation is used to study and detect the correlation be¬ 
tween the morphological anomalies of the hippocampus and the occurrence of the 
Alzheimer’s disease. Hence its importance is strictly related to the early predic¬ 
tion of the dementia [5] . Since the manual tracing is time-consuming and highly 


operator dependent, it is important to make this process as much automatic as 
possible. 

As discussed in [5] , automatic image analysis and classification methods ex¬ 
ist, able to recognize brain anomalies at level of the single patient, which is more 
useful than at level of groups or categories of individuals. Nonetheless they po¬ 
tentially require a large amount of parameters (vector of features) to properly 
manage all differences and specific features of the human brain among individu¬ 
als, causing the parameter space to explode in terms of complexity, redundancy 
and noise. To find a limited amount of features, able to recognize patterns with 
a sufficient level of accuracy, and without requiring a huge computational effort, 
would be indeed very helpful. This is especially true when the feature selection 
and classification are performed by machine learning techniques, since the in¬ 
trinsic self-organizing selection of important features and their cross-correlation 
remove any potential biased interpretability of the feature space. 

Several approaches have been proposed to reach different levels of automa¬ 
tion [^. Among known methods, we quote just Morra et al. HIE], which suggest 
different automatic methods based on Support Vector Machines (SVM) and hi¬ 
erarchical Adaboost, by considering about 18,000 voxel features and FreeSurfer 
[1], a standard medical software tool for the analysis of cortical and subcor¬ 
tical anatomy, which performs a segmentation on cortical surface streams by 
constructing models of boundaries among white and gray matter. 

Similarly, for an automatic hippocampal segmentation we use a voxel based 
approach by using 315 local features for each voxel included in a para-hippocampal 
region larger than the hippocampus volume. Extracting 315 features for such a 
large number of voxels needs massive processing time and massive computational 
resources. For this reason, we consider crucial the issue of Feature Selection (FS) 
or reduction. The utility of feature selection is: (a) to avoid overfitting, by min¬ 
imizing the dimension of the parameter space and improve model performance, 
i.e. prediction performance in the case of supervised classification and better 
cluster detection in the case of clustering, (b) to provide faster and more cost- 
effective models, (c) to gain a deeper insight into the underlying processes that 
generated the data and (d) to optimize the processing time and massive compu¬ 
tational resource. 

There is a price to be paid for this advantage. To search for a subset of relevant 
features introduces in fact an additional layer of complexity in the modeling task: 
it needs to find the optimal model parameters for the optimal feature subset, as 
there is no guarantee that the optimal parameters for the full input feature set 
are equally optimal also for the best feature subset [314]. 

By providing a small quantity of features, it may reduce the computational 
time as being proportional to the number of features. Furthermore, in some cases 
it allows to gain a better classification accuracy [10]. Also, the reduction of the 
feature’s number is necessary when, to train the classifier, it is available only a 
limited number of examples. In this regard, it is shown that, for the same error 
rate, a classifier requires a training whose duration grows exponentially with the 
number of variables [nunids]. 


Feature reduction, therefore, includes any algorithm that finds a subset of 
input feature set. A feature reduction capability is present also in more gen¬ 
eral methods based on transformations or combinations of the input feature 
set (feature extraction algorithms). An example being the well known Principal 
Component Analysis (PCA), which eliminates the redundancy of information by 
generating new features set by a combination of input features M- 

However, the best feature selection, by preserving the original semantics of 
features, permits also to maintain a coherent interpretability. The main goal of 
this study is to exemplify and demonstrate the benefits of applying FS algorithms 
in hippocampus segmentation field. 


2 Materials 

The database used to perform the described experiments is composed by thirty- 
five Tl-weighted whole brain MR images, and the corresponding manually seg¬ 
mented bilateral hippocampi (masks). All images were acquired on a 1.0 T scan¬ 
ner according to MP-RAGE sequence for magnetic resonance imaging of the 
brain [Him in]. 

The images are derived from the Open Access Series of Imaging Studies (OA¬ 
SIS). In particular we used 35 MP-RAGE MRI brain scans with a resolution of 1 
mm? provided in occasion of the MICCAI SATA challenge workshop 2013 [TS] . 
By using this homogeneous data sample it was possible to reduce the training 
image sub-sample without loosing in generality and learning capabilities, giv¬ 
ing the possibility to keep a sufficiently wide test set to perform a well-posed 
statistical analysis on the feature selection performances. 

The image processing and classification were carried out blindly with respect 
to the subject status. 

The first stage of our analysis chain requires an image pre-processing to 
standardize them both spatially and in gray intensity. This operation is obtained 
by registering the images on the Montreal Neurological Institute {MNI) standard 
template (IGBM152) using 12-parameter affine-registration, and subsequent re¬ 
sampling on an isotropic grid with 1 mm^ voxel-size. 

In order to reduce the computational time of the analysis, from the MRI 
spatially standardized, two volumes containing the left and right hippocam¬ 
pus including the relevant para-hippocampal regions are extracted using a new 
method FAPoD (Fully automatic Algorithm Based on Point Distribution Model) 
described in muni- 

We can then proceed with the feature extraction only in this identified region 
of interest : we approach a binary classification voxel-based problem, where the 
categories are hippocampus or not-hippocampus, that is based on supervised pat¬ 
tern recognition systems. The features should contain information relevant to the 
classification task. Since manual segmentation of the hippocampus is based on 
local texture information, we adopted the related features. In the analysis pre¬ 
sented here for each voxel a vector whose elements represent information about 
position, intensity, neighboring texture |21] . and local filters, was obtained. 


Texture information was expressed using both Haar-like and Haralick features 



The Haralick features were calculated from the normalized gray-level co¬ 
occurrence matrices (GLCM) created on the mxm voxels projection sub-images 
of the volume of interest; m dehnes the size of overlapping sliding-windows. For 
each voxel, values of m varying from 3 to 9 were used. Each element {k,p) of a 
co-occurrence matrix indicates the probability that two voxels, separated by a 
specified spatial angle and distance, have gray-level k and p respectively. 

A subset of Haralick features is sufficient to obtain a satisfactory discrimi¬ 
nation. To establish which of the original 14 GLCM Haralick features gives the 
best recognition rate, several preliminary recognition experiments were carried 
out [2l]. The resulting best configuration has been individuated in 4 features: 
energy, contrast, correlation, inverse difference moment |20] . 

Finally, the gradients calculated in different directions and at different dis¬ 
tances were included as additional features. The best analysis configuration, ex¬ 
pressed by the highest values of statistical indicators (see Sec. [I, was obtained 
with 315 features, described in Tabled] 

By summarizing, the Knowledge Base (KB) was consisting of 35 regions of 
interest (ROl) extracted from as many images, each one composed by 7910 vox¬ 
els, where each voxel is represented through a vector of 315 features. Therefore 
the training set, including 10 randomly selected images, was formed by a total 
of 79100 X 315 entries. In quantitative terms it can be considered a sufficiently 
wide data set, qualitatively able to cover all feature types, needed to perform 
a complete training, avoiding the useless redundancy of information not needed 
by machine learning methods [25) . and leaving a sufficiently large amount of 
samples to be dedicated to the test sessions. 


number 

description 

1 

position 

1 

grey level 

66 

Haralick features for mask 3x3 

66 

Haralick features for mask 5x5 

66 

Haralick features for mask 7x7 

66 

Haralick features for mask 9x9 

49 

Haar-like 3D features 


Table 1. The 315 features extracted from the 3D MRI images. Of each group of 66 
Haralick features, 13 are the gradients along the 13 diagonals, 5 the principal moments, 
and the rest are the three sets of 16 textural features, one set for each plane of the 
voxels. The gradients for each voxel are measured in all directions at one voxel distance 
and the relative 3D positions are included as features. 





3 Methods 


The FS techniques are usually counted in three categories, based on their internal 
combination between the selection and classification of the reduced parameter 
space. These categories are respectively named as wrapper, filter and embedded 
methods [55]. 

Filter method is a technique based on the measurement of the importance 
of each single feature of the given parameter space m- The selected features 
are the most relevant to obtain a correct classification. This technique include 
methods suitable for high-dimensional datasets, since they are computationally 
fast. Furthermore, they are independent from the classification algorithm and 
therefore their results can be used for all types of classifier. However, since each 
feature is considered separately from the others, their positive contribution based 
on the combined effect is neglected. The filter method used in our analysis is 
based on the Kolmogorov-Smirnov (K-S) test. 

Wrapper methods basically integrate the two aspects of the workflow, i.e the 
model hypothesis and feature search [28] . This procedure involves the generation 
and evaluation of various subsets of features. Every generated feature subset 
is associated to a classification criterion (hence the name wrapper). Since the 
number of all possible feature subsets grows exponentially with the size of the 
dataset, some search heuristics can be adopted to reduce drastically the number 
of operations. They can be grouped into deterministic and randomized search 
methods. The advantage of these methods is the intrinsic best interaction among 
selected features and their classifiers, but with the downside to have a high 
computational cost and the risk of overfitting. The wrapper methods used in 
our analysis are respectively, sequential forward selection (SFS) and sequential 
backward elimination (SHE). 

Einally, in embedded methods the optimal feature subset search is directly 
nested into the classifier algorithm [29] . Such techniques can be interpreted in 
terms of a search within a combined parameter space, by mixing features and 
hypotheses. Analogously to wrapper methods, they include the interaction with 
classification algorithm, but in a faster way. The embedded method used in our 
analysis is based on the Random Forest Classifier. 

To recap, in our FS analysis we used: 

— Univariate filter method: Kolmogorov-Smirnov, 

— Deterministic wrapper methods: sequential forward selection (SFS) and se¬ 
quential backward elimination (SHE); 

— Embedded method: Random Eorest. 

In addition, we have also used the PCA [3D], being one of the most widely 
adopted feature reduction techniques, for comparison. 

To estimate the goodness of the selected feature group we used the Naive 
Bayes Classifier |3T], based on the simplified hypothesis that all attributes de¬ 
scribing a specific instance on data are conditionally independent among them¬ 
selves. 


The FS analysis was performed in the 5-fold cross validation on 10 of 35 
images in the database. The goodness of the selected group was tested on the 
remaining 25 images. As already discussed in Sec. [51 the selected training and 
test rates were considered sufficiently wide to ensure a well-posed training and 
the post-processing statistical evaluation. 

The k-fold cross validation is a technique able to avoid overfitting on data 
and is able to improve the generalization performance of the machine learning 
model. In this way, validation can be implicitly performed during training, by 
enabling at setup the standard leave-one-out k-fold cross validation mechanism 
|32j . The automatized process of the cross-validation consists in performing k 
different training runs with the following procedure: (i) splitting of the training 
set into k random subsets, each one composed by the same percentage of the 
data set (depending on the k choice); (ii) at each run the remaining part of 
the data set is used for training and the excluded percentage for validation. 
While avoiding overfitting, the k-fold cross validation leads to an increase of the 
execution time estimable around k — 1 times the total number of runs. 

Furthermore, the combination of the Bayes rule with the above simplified 
assumption has a positive impact on the model complexity and its computa¬ 
tional time. In particular, the latter property pushed us to choose this model as 
embedded classifier for the feature selection problem. 

The agreement between an automated segmentation estimate and a manual 
segmentation can be assessed using overlap measures. A number of measures 
are available: (a) Dice Index [^, [33]; (b) efficiency; (c) purity of a class; (d) 
completeness of a class; (e) contamination of a class. 

At the base of the statistical indicators adopted, there is the commonly known 
confusion matrix, which can be used to easily visualize the classification perfor¬ 
mance |34j : each column of the matrix represents the instances in a predicted 
class, while each row represents the instances in the real class. One benefit of a 
confusion matrix is the simple way in which it allows to see whether the system 
is mixing different classes or not. 

We remark here that we were mostly interested to the feature analysis re¬ 
lated to the classification of the hippocampus class voxels. Therefore we consid¬ 
ered as particularly relevant the Dice index, usually referred to the true positive 
class (Naa in our confusion matrix), which in our case correspond properly to 
hippocampus class. Since, by definition, the Dice index does not take the true 
negative rate into account, the rate of not-hippocampus voxels is not involved 
within this indicator. A statistical evaluation of this latter class, corresponding 
to the background voxels, has been primarily included for completeness and for 
coherency with the full confusion matrix representation. The highest relevance 
given to the hippocampus class analysis represents also a common evaluation 
criterion in such context [3]. 

In terms of binary classification, we were more interested to perform a fea¬ 
ture selection analysis, rather than to improve the classification performances. 
Therefore we imposed a standard classification threshold to 0.5 at the begin- 


ning of the experiments and maintained unchanged all over the entire described 
process, by considering it as sufficient for our specific purposes. 

More specifically, for a generic two-class confusion matrix. 



OUTPUT 


— 

Class A 

Class B 

TARGET 

Class A 

Naa 

Nab 


Class B 

Nba 

Nbb 


we then use its entries to define the following statistical quantities: 

— total efficiency: te. Defined as the ratio between the number of correctly 
classified objects and the total number of objects in the data set. In our 
confusion matrix example it would be: 

__ Naa + Nbb _ 

Naa + Nab + Nba + Nbb 

— purity of a class: pcN. Defined as the ratio between the number of correctly 
classified objects of a class and the number of objects classified in that class. 
In our confusion matrix example it would be: 

^ Naa + Nba 


pcB 


Nbb 

Nab + Nbb 


— completeness of a class: cmpN. Defined as the ratio between the number of 
correctly classified objects in that class and the total number of objects of 
that class in the data set. In our confusion matrix example it would be: 


cmpA 


Naa 

Naa + Nab 


cmpB 


Nbb 

Nba + Nbb 


— contamination of a class : cntN. It is the dual of the purity, namely it is the 
ratio between the misclassified objects in a class and the number of objects 
classified in that class, in our confusion matrix example will be: 


cut A = 1 — pcA 


Nba 

Naa + Nba 


Nab 


cntB = I — pcB 


Nab + Nbb 



— Dice index : Dice. Known also with the name of Fiscore, it is a frequent mea¬ 
sure used in binary classification, which could be considered as a weighted 
average of the purity and completeness, reaching its best value at 1 and the 
worst at 0. By referring to our notation, we have the Dice defined as: 

Dtce = 2 • = 2 _ 

pcA + cmpA 2Naa + Nba + Nab 


4 Results 

By using Naive Bayes Classifier on all 315 input features, the goodness is es¬ 
timated in 5-fold cross validation on 10 images. The results in terms of the 
statistics derived from the confusion matrix, are shown in Table [5] and the Dice 
index is 0.60 ± 0.04. 

The PCA applied to 315 input features returns the principal components 
(PCs) ordered by the amount of information they convey. The percentage of 
information contained in the first 98 PCs and in the first 197 PCs are respectively 
90% and 99%. 

Since our goal was to reduce the feature retaining the goodness in the clas¬ 
sification, we considered the first 197 PCs containing 99.0% of the information. 
The results obtained are shown in Table [3] and the Dice index is 0.62 ± 0.07. As 
above, we used the Naive Bayes Classifier in 5-fold cross validation. 

Compared to the use of all 315 original features, the values obtained with 
197 PCs are on average 6% points lower in terms of Dice Index. Therefore, to 
avoid loss of information, we considered all 315 PCs. The results are reported in 
tableland the Dice index is 0.63 ± 0.03. 

Even using all the PCs, the result was 5% points lower in terms of Dice 
Index. This result confirms what already found by Golland et al. in [3^ , i.e. that 
the selection of large-variance features performed by the PCA is not specifically 
suited for segmentation problems. 

4.1 Kolmogorov-Smirnov analysis 

The (K-S) test provides an estimate of how much two distributions are related 
to each other. The K-S test allowed us to select only the features which have a 
correlation between the two hippocampus and not-hippocampus classes less than 
5%, resulting in a total of 57 features. 

As above, we used the Naive Bayes Classifier in 5-fold cross validation. The 
results obtained are shown in tableland the Dice index is 0.67 ± 0.04. 

The K-S test results are comparable with the original parameters space based 
on 315 features. 

4.2 Sequential forward selection and backward elimination 

The two FS methods belonging to the wrapper category experimented in our case 
were SFS and SBE. In Figure [Ija) on the ordinate axis it is shown the top value 





of Dice Index achieved between all possible combinations related to the reference 
step depicted on the horizontal axis. At each step, the feature achieving the best 
performance is chosen, when used in combination with the selected features 
in the previous step. The step number coincides with the number of selected 
features (SFS). 

In Figure (Hb) on the ordinate axis it is shown the top value of Dice Index 
achieved between all possible combinations related to the reference step depicted 
on the horizontal axis. At each step it is removed the feature without which the 
best performances are obtained. The step number coincides with the number of 
eliminated features (SEE). 

We observe that the SFS method reaches its highest Dice Index, 0.75, at the 
step 36. So it means that the best performance, using the Naive Bayes Classifier, 
is obtained with only 36 selected features, listed in Tab. [71 

The SEE method obtains, its highest Dice Index, 0.75 at the step 292. There¬ 
fore the best performance, evaluated with the Naive Bayes Classifier, is obtained 
by using the remaining 23 features (i.e. 315 — 292), listed in Tab.jHl 

Tables [B] (related Dice index is 0.75 ± 0.03) and [5] (related Dice index is 
0.75 ± 0.02), respectively, show the relative performance of the peak value in 
Figured) 


4.3 Random Forest analysis 

The Random Forest classification methodology allowed us to estimate the feature 
importance [5B]. To select the best subset we have performed a study of clas¬ 
sification with cross validation procedure based on the Naive Bayes Classifier, 
varying the threshold on the feature importance index. The optimal threshold 
was related to the maximum Dice Index value and achieved with 222 features. 
Also in this case we used the Naive Bayes Classifier in 5-fold cross validation 
to evaluate the features selected by the Random Forest. The result obtained is 
shown in table [TOl and the Dice index is 0.69 ± 0.04. 


4.4 Random selection test 

Furthermore we performed an additional group of tests to evaluate whether 
randomly selected samples of 36 features among the original 315, might lead to 
Dice indexes greater than or comparable with the Dice value obtained with SFS 
(0.75). To do so, we estimate the empirical probability density function of Dice 
under the null hypothesis that any set S* of 36 features provides a Dice value 
greater than or equal to the true Dice in predicting whether a voxel belongs 
to hippocampus or not. To test this hypothesis, 2000 sets S* were generated, 
each composed of 36 features randomly drawn from the ones available and the 
corresponding Dice values were evaluated. The obtained results are shown in 
Figure ID 


5 Discussion and Conclusion 


Main goal of this work was to verify the possibility to reduce the number of 
required voxel features without loosing or better by enhancing the classification 
performances. Moreover the reduction of the number of voxel features could also 
improve the computational efficiency of the classification. 

As clearly resulted from a recent review, [2] , by now the feature selection has 
to be considered an essential step within the field of neuroimaging approached 
by the machine learning paradigm. Its importance is also invariant to the specific 
technique used to extract and codify the features from MRIs regions of interest, 
whether it is based on standard n-dimensional feature vectors or on pairwise 
dissimilarity representation. In the present work we investigated the application 
of several feature selection methods. 

The results obtained using different approaches are summarized in Table [TT] 
and in Figure [5j We observe that using this two selected subsets it is possible to 
obtain higher performances than using the entire input dataset. 

By considering the percentage of random Dice values bigger than the best 
one with respect to the total number of random extractions, such value is zero. 
But, as it can be seen in Figure lU in many cases it appears to obtain better 
performances by randomly extracting the feature sample, than to consider the 
complete set of 315 features. 

Among the FS approaches presented in this work, the SFS and SBE show 
better performances. 

We would underline that the results shown in Figure [5] have to be mainly 
interpreted as a comparison among the different methods of feature selection. 
What has to be stressed is that the performances are influenced by the feature 
information content as well as the image enhancement techniques employed. A 
quite simple method, such as the Naive Bayes Classifier, is able to reach state of 
the art performances when preceded by a selection analysis on the feature space. 
A more detailed study of the classification methods and of the post-processing 
technique which can be used to improve performances are presented in other 
studies [37l|38]. 

To test the goodness of the best feature selection methods presented in this 
paper we used the two selected sets formed, respectively, by 36 and 23 features 
on a blind test database composed of 25 MRIs (i.e. not used in training phase), 
in the algorithm cited in [33 , (see Tables [7] and [3] respectively). 

By analyzing the two subsets of selected features, it resulted that 13 of the 23 
extracted by the SBE method are also present in the sample of 36 features ob¬ 
tained by the SFS technique. Most of them are Haralick and Statistical features, 
except for the positional and Haar-like features, confirming the most impor¬ 
tance given by Haralick and Statistical types as well as a very low contribution 
of Haar-like type. 

We remark that, by minimizing the presence of Haralick features, in particu¬ 
lar the correlations, it allows to improve the processing time as well as a better 
handling of the information content. In fact, among the three categories of fea- 


tures considered here, the Haralick type was the most time consuming from the 
computational point of view. 

The comparison of our FS methods with the widely used PCA demonstrates 
the very low performance of the PCA technique (as shown in Figure [S|). This 
result is in agreement with the well-known downside of the method in presence 
of a very high non-linearity of the feature correlations. It is also an indirect 
confirmation about the intrinsic difficulty to separate the hippocampus vs not- 
hippocampus classes from MRI images. 

We conclude that the SFS and SBE techniques are two promising methods 
allowing to reduce the input space size, with a very low loss of information and 
permitting classification performances comparable or even better than the case 
with a larger amount of features. 

In fact, in terms of feature space dimension comparison, Morra [5] performs a 
voxel-based segmentation using about 18,000 features with the weighted voting 
method AdaBoost [7] tested on a different image data set. In addition, FreeSurfer 
[T] , which is a not voxel-based method considered a standard benchmark for MRI 
segmentation experiments, reaches a Dice value of 0.76 ± 0.05. 

In this work, we observed that the selected features from both SFS and SBE 
methods are related to the high frequency component of the image. So this result 
would suggest which kind of features are best suitable for high frequency clas¬ 
sification problems such as edge recognition. In fact, these correlation features, 
being based on intensity differences, are able to capture local information based 
on discontinuity rather than similarity. 

Besides, this result is a further suggestion for a future investigation: to put 
in practice a preprocessing procedure to enhance the contours of the structures 
contained in the image and to assess the usefulness of these procedures in the 
diagnosis support systems. 
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315 

input features 

Completeness 
of a class 

Purity 
of a class 

Contamination 

of a class 

Hippocampus 

79% 

62% 

38% 

Not Hippocampus 

63% 

80% 

20% 

Efficiency 

70% 


Table 2. Classification result on all 315 input features using Naive Bayes Classifier in 
5-fold cross validation based on confusion matrix. 


197 

Completeness 

Purity 

Contamination 

PCs 

of a class 

of a class 

of a class 

Hippocampus 

60% 

68% 

32% 

Not Hippocampus 

78% 

72% 

28% 

Efficiency 

71% 


Table 3. Classification result on the first 197 PCs using Naive Bayes Classifier using 
in 5-fold cross validation based on confusion matrix. 


315 

Completeness 

Purity 

Contamination 

PCs 

of a class 

of a class 

of a class 

Hippocampus 

86% 

51% 

49% 

Not Hippocampus 

36% 

78% 

22% 

Efficiency 

58% 


Table 4. Classification result on all 315 PCs using Naive Bayes Classifier using in 
5-fold cross validation based on confusion matrix. 


57 features 

Completeness 

Purity 

Contamination 

Kolmogorov-Smirnov 

of a class 

of a class 

of a class 

Hippocampus 

84% 

57% 

43% 

Not Hippocampus 

52% 

81% 

19% 

Efficiency 

66% 


Table 5. Classification result on 57 features selected trough Kolmogorov-Smirnov test 
using Naive Bayes Classifier using in 5-fold cross validation based on confusion matrix. 


36 features 

Completeness 

Purity 

Contamination 

Forward Selection 

of a class 

of a class 

of a class 

Hippocampus 

82% 

70% 

30% 

Not Hippocampus 

73% 

84% 

16% 

Efficiency 

77% 


Table 6. Classification result on 36 features selected trough forward selection method 
using Naive Bayes Classifier in 5-fold cross validation based on confusion matrix. 
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Fig. 1. Best Dice Index of all the possible combinations of the relevant step in (a) 
sequential forward selection and in (b) sequential backward elimination methods. 



Fig. 2. Haar-like template types 1 (left) and 2 (right) used in the experiments. 







Fig. 3. Representation of a generic cubic mask used for calculating the gradient fea¬ 
tures. The labeled points are either the vertexes of the cube or the median points of 
the segments. 



Fig. 4. Distribution of 2000 random Dice values compared with true Dice (shown with 
the dashed red line) concerning to 36 features obtained by the sequential forward 
selection . 
















36 features 
Forward Selection 

F 

Orientation 

[aralick 

Coordinate 

Msize 

Haar-like 

Type 

Stati 

Msize 

stical 

Entry 

contrast* 

135 

Y 

3 




gradient* 





5 

EC 

correlation 

135 

X 

3 




position* 






coords 

norm, gray level* 






value 

correlation* 

45 

X 

5 




gradient* 





5 

DF 

correlation* 

90 

Y 

9 




correlation 

45 

Y 

7 




skewness* 





7 


homogeneity* 

90 

X 

9 




correlation 

0 

Y 

5 




correlation 

90 

Z 

5 




correlation* 

45 

X 

3 




correlation 

135 

Z 

9 




correlation 

90 

Y 

5 




correlation 

135 

Z 

5 




correlation 

0 

Z 

7 




correlation 

90 

Z 

7 




correlation 

90 

Z 

9 




correlation 

0 

Y 

3 




correlation 

135 

X 

3 




correlation 

0 

Z 

9 




template* 




1 



skewness* 





5 


correlation 

90 

Z 

3 




correlation 

45 

X 

5 




gradient 





3 

MN 

template 




2 



correlation* 

45 

X 

9 




correlation 

45 

Y 

5 




correlation 

90 

Y 

7 




correlation 

45 

Z 

5 




gradient 





9 

DF 

homogeneity 

0 

Z 

9 




correlation 

0 

Y 

9 





Table 7. Details of the 36 features resulting by the forward selection method using 
Naive Bayes Classifier. The asterisk indicates the entries also present in the list of 23 
SBE features. For Haralick features, the orientation in degrees, reference coordinate 
and the size of the cubic mask used are reported. In case of Haar-like features, the 
entry value indicates the template type used (see Fig. El). For statistical/positional kind 
there are listed the size of the cubic mask used or the self-explained value, depending 
on the specific feature type. In particular for gradients, the column Entry indicates 
the segment of the reference diagonal as shown in Fig. El All the features are listed in 
top-down order of their inclusion during the SFS procedure execution. 













































23 features 

Completeness 

Purity 

Contamination 

Backward Elimination 

of a class 

of a class 

of a class 

Hippocampus 

83% 

70% 

30% 

Not Hippocampus 

73% 

85% 

15% 

Efficiency 

77% 


Table 8. Classification result on 23 features selected trough backward elimination 
method using Naive Bayes Classiher in 5-fold cross validation based on confusion ma¬ 
trix. 


23 features 

Backward Elimination 

F 

Orientation 

[aralick 

Coordinate 

Msize 

Haar-like 

Type 

Stati 

Msize 

stical 

Entry 

position* 






coords 

norm, gray level* 






value 

correlation 

0 

Y 

7 




correlation* 

45 

X 

3 




correlation* 

45 

X 

5 




correlation 

45 

X 

7 




correlation* 

45 

X 

9 




correlation 

45 

Y 

9 




correlation 

45 

Y 

5 




correlation* 

90 

Y 

9 




homogeneity 

135 

Z 

3 




gradient* 





5 

DF 

gradient 





7 

DF 

contrast* 

135 

Y 

3 




gradient 





9 

OP 

homogeneity* 

90 

X 

9 




gradient 





3 

BH 

skewness* 





7 


gradient* 





5 

EC 

gradient 





3 

IL 

template* 




1 



skewness* 





5 


gradient 





5 

MN 


Table 9. Details of the 23 features resulting by the backward elimination method using 
Naive Bayes Classifier. The asterisk indicates the entries also present in the list of 36 
SFS features. For Haralick features, the orientation in degrees, reference coordinate and 
the size of the cubic mask used are reported. In case of Haar-like features, the entry 
value indicates the template type used (see Fig. EJ. For statistical/positional kind there 
are listed the size of the cubic mask used and/or the self-explained value, depending 
on the specific feature type. In particular for gradients, the column Entry indicates the 
segment of the reference diagonal as shown in Fig. [S] 








































222 features 

Completeness 

Purity 

Contamination 

Random Forest 

of a class 

of a class 

of a class 

Hippocampus 

80% 

62% 

38% 

Not Hippocampus 

62% 

80% 

20% 

Efficiency 

70% 


Table 10. Classification result on 222 features selected trough Random Forest method 
using Naive Bayes Classifier in 5-fold cross validation based on confusion matrix. 


Method 

Size selected group 

Dice Index 

original dataset 

315 

0.69 ±0.04 

PCA selection 

197 

0.62 ±0.07 

K-S selection 

57 

0.67 ±0.04 

Forward Selection 

36 

0.75 ±0.02 

Backward Elimination 

23 

0.75 ±0.02 

Random Forest 

222 

0.69 ±0.04 


Table 11. For each implemented method size selected group, mean Dice Index (eval¬ 
uated using Naive Bayes Classifier) and related cr are shown. 



Fig. 5. Dice Index comparison for the following methods: Original dataset (315 for 
each voxel); PCA (197 selected features); K-S test (57 selected features); SFS (36 
selected features); SBE (23 selected features); Random Forest (222 selected features). 
Boxes have lines at the lower quartile, median, and upper quartile values, with whiskers 
extending to 1.5 times the inter-quartile range. Outliers are indicated by a plus sign. 

























